Optimal and Approximate Q-value Functions for Decentralized POMDPs

نویسندگان

Frans A. Oliehoek

Matthijs T. J. Spaan

Nikos A. Vlassis

چکیده

Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (DecPOMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bounded Dynamic Programming for Decentralized POMDPs

Solving decentralized POMDPs (DEC-POMDPs) optimally is a very hard problem. As a result, several approximate algorithms have been developed, but these do not have satisfactory error bounds. In this paper, we first discuss optimal dynamic programming and some approximate finite horizon DEC-POMDP algorithms. We then present a bounded dynamic programming algorithm. Given a problem and an error bou...

متن کامل

Decentralized POMDPs

This chapter presents an overview of the decentralized POMDP (Dec-POMDP) framework. In a Dec-POMDP, a team of agents collaborates to maximize a global reward based on local information only. This means that agents do not observe a Markovian signal during execution and therefore the agents’ individual policies map from histories to actions. Searching for an optimal joint policy is an extremely h...

متن کامل

An Investigation into Mathematical Programming for Finite Horizon Decentralized POMDPs

Decentralized planning in uncertain environments is a complex task generally dealt with by using a decision-theoretic approach, mainly through the framework of Decentralized Partially Observable Markov Decision Processes (DEC-POMDPs). Although DEC-POMDPS are a general and powerful modeling tool, solving them is a task with an overwhelming complexity that can be doubly exponential. In this paper...

متن کامل

Automated Generation of Interaction Graphs for Value-Factored Decentralized POMDPs

The Decentralized Partially Observable Markov Decision Process (Dec-POMDP) is a powerful model for multiagent planning under uncertainty, but its applicability is hindered by its high complexity – solving Dec-POMDPs optimally is NEXP-hard. Recently, Kumar et al. introduced the Value Factorization (VF) framework, which exploits decomposable value functions that can be factored into subfunctions....

متن کامل

An Approximate Dynamic Programming Approach to Decentralized Control of Stochastic Systems

In this paper we consider the problem of computing decentralized control policies for stochastic systems with finite state and action spaces. Synthesis of optimal decentralized policies for such problems is known to be NP-hard [15]. Here we focus on methods for efficiently computing meaningful suboptimal decentralized control policies. The algorithms we present here are based on approximation o...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

J. Artif. Intell. Res.

دوره 32 شماره

صفحات -

تاریخ انتشار 2008

Optimal and Approximate Q-value Functions for Decentralized POMDPs

نویسندگان

چکیده

منابع مشابه

Bounded Dynamic Programming for Decentralized POMDPs

Decentralized POMDPs

An Investigation into Mathematical Programming for Finite Horizon Decentralized POMDPs

Automated Generation of Interaction Graphs for Value-Factored Decentralized POMDPs

An Approximate Dynamic Programming Approach to Decentralized Control of Stochastic Systems

عنوان ژورنال:

اشتراک گذاری